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ABSTRACT: 

This paper considers a class of non-Markovian discrete-time random processes on a finite 
state space {!,... ,d}. The transition probabilities at each time are influenced by the 
number of times each state has been visited and by a fixed a priori likelihood matrix, 
R, which is real, symmetric and nonnegative. Let Si{n) keep track of the number of 
visits to state i up to time n, and form the fractional occupation vector, V(n), where 
Vi{n) = Si{n) / {J2'j=i Sj{n)). It is shown that V(n) converges to to a set of critical points 
for the quadratic form H with matrix R, and that under nondegeneracy conditions on 
R, there is a finite set of points such that with probability one, V(n) p for some p in 
the set. There may be more than one p in this set for which P(V(n) ^ p) > 0. On the 
other hand P(V(?t,) p) = whenever p fails in a strong enough sense to be maximum 
for H. 
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1 Introduction 



This paper considers a stochastic process in discrete time on a finite state space {1, . . . ,d}, 
in which the probabihty of a transition to site j increases each time j is visited. To 
define the process, let R be a real symmetric d x d matrix with Rj^ > for each i,j, 
and Z^jRij > for each j. For n > d, inductively define random variables Yn and 
S(n) = ('S'i(n), . . . , Sd{n)) as follows. Let Si{d) — 1 ior i — 1, . . . ,d and let Yd — 1. Let 
J^n be the u-field generated hy Yj : d < j < n and let Yn+i satisfy 

P(>"n+1 = j I ^n) = RY„,jSj{n)/Y,'^Yn,iSiH- 

i 

Let Si{n+1) = Si{n) + (^y„+i,i. In other words, S(n) counts one plus the number of times 
Y has occupied each state. The sequence of ordered pairs (y„, S(n)) is a Markov chain, 
whereas the sequence F„ is not. 

Define V(n) = S(n)/n, so that each V(n) is an element of the d— 1-simplex A C R*^. 
(In general, boldface is used for vectors and hghtface is used for their components.) 
This paper studies the question of when V(n) converges and to which possible limits. 
Since V(n) may be viewed as an empirical occupation measure for the Y process, this is 
essentially asking whether Y obeys a strong law of large numbers. A few remarks about 
the model are in order. 

The process is meant to model learning behavior. Think of R^j as a set of initial 
transition probabilities; each time Y visits site j, this choice is positively reinforced, 
resulting in transition probabilities proportional to TiijSj. The choice of starting state. 
Yd = 1, is arbitrary; also, setting each Si{d) equal to one is a matter of convenience 
and in fact the theorems in this paper are true for any choice of Si{d) > and any 
Yd e {1, . . . ,d}. The requirement that R be symmetric may not always be reasonable in 
applications, but is essential for our arguments. 

Similar models have been studied in [3] under the name of random processes with 
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complete connections. When the entries of R are all one, the model reduces to a Polya urn 
model; the behavior in this case is atypical, since most of our results apply to the "generic" 
case where R is invertible. Another similar process called edge-reinforced random walk is 
studied in [1, 5, 6, 2]; in that case, transitions from i to j are positively reinforced each 
time a transition is made from i to j or j to i. Thinking of the process as traversing a 
graph with vertices 1, . . . ,d, this kind of reinforcement keeps track of moves along each 
edge of the graph, while the process studied in the present paper keeps track of visits to 
each vertex. Strong laws for edge-reinforced random walk can be found in [1, 5, 2]. 

The remainder of this introductory section motivates and states the main results. 
Subsequent sections give proofs of of the four results. Examples and open questions are 
discussed in the final section. 

Definition 1 For v G A, let A^i(v) — J2j ^jVi- Abbreviate this by Ni when a particular 
vector V may be understood. 

Definition 2 Forv e A, let H{v) = T,iViNi{-v) = ^ijViVj. 

Definition 3 For v G A such that H{y) > 0, define a vector 7r(v) E A by TTi{'v) = 
v^N,{w)/H{y). 

Definition 4 For v G A such that H{v) > 0, define a Markov transition matrix M(v) 
by Mij{v) ^RijVj/Ni. 

Note that H(V{ri)) is below by min{Rij : Rj^ > 0}. Thus H never vanishes on the 
closure of the set of possible values of V(n), and the clauses about H not vanishing 
in the above definitions are merely pro forma. For a fixed v, (7rM)j = J2j '^iMij — 
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J2j{viNi/H){IlijVj/Ni) = J2ji^i'Vj'R-ij/H = ViNi/H = Hi, so 7r(v) is an invariant proba- 
bility for the transition matrix M(v). The behavior of V(n) can heuristically be explained 
as follows. 

For n » L » 1, compare V(n + L) to V(n). Since L, the Y process between 
these times behaves as if V is not changing, and hence approximates a Markov chain 
with transition matrix M(V(n)). Since L 1, the occupation measure between these 
times will be close to the invariant measure 7r(V(n)). This means that V(n + L) a; 
V(n) + (L/n)(7r(V(n)) — V(n)). Passing to a continuous time limit gives 



Up to an exponential time change, V should then behave like an integral curve for the 
vector field tt — /. One would expect convergence to a critical point or set and, because of 
the random perturbations, one would not expect convergence to any unstable equilibrium. 
It is not in general possible to find a potential for this vector field, but the function H is 
a Lyapunov function for it. Then one expects convergence of V(n) to a maximum for H. 

Definition 5 Let C C A be the set of points v for which 7r(v) = v. The term critical 
point will be used to denote points of C. Let Co C. A bet the set of points v for which 
M(v) is reducible. 

Section 2 will discus the nature of C and Co, and give conditions under which Theo- 
rem 1.1 (proved in Section 3) implies almost sure convergence of V(n). 

Theorem 1.1 With probability one, dist(V{n),C U Co) — > 0, where dist{x,A) denotes 
mi{\x -y\:yeA}. 

Definition 6 For v G A, define faceiy) = {w E A : \/i,Vi = implies Wi = 0} to be 
the closed face of A to which v is interior. 



d_ 
~dt 



Y{t)^\{'K{Y{t))--V{t)). 



(1) 
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Definition 7 For any p e C that is in a proper face of A a linear non-maximum iff 

Dpif(efc — Cj) > for some ^ /ace(p), ej e /ace(p). (2) 
(^ifere ei, . . . , are ^/le standard basis vectors in JR^ .) 

The following theorems, proved in Section 5 and 4 respectively, give conditions under 
which convergence to a critical point is impossible. 

Theorem 1.2 Suppose that R is nonsingular and let p he the unique critical point in 
the interior of A. Then P(V(n) ^ p) = whenever p fails to he a maximum for H. 
This happens if and only if R has more than one positive eigenvalue, which happens if 
and only if the linear operator Dp(7r — /) on — p + A has a positive eigenvalue. 

Theorem 1.3 Suppose p is a linear non-maximum in a proper face of A. ThenP(V{n) - 
p) = 0. 

A sort of converse to these nonconvergence theorems gives a criterion for convergence 
with positive probabihty of V(n) to stable critical points. This is proved in Section 3 

Theorem 1.4 Let A he a component of C disjoint from Co and suppose that A is a local 
maximum for H in the sense that there is some neighorhood H of A for which v & Af 
andpeA imply H{y) < H{p). Then P{dist{Y{n), A) ^ 0) > 0. 

2 Preliminaries 

The following proposition verifies that H is a Lyapunov function for the vector field 
7T — I and gives alternate characterizations of the set of critical points. The notation used 
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throughout for vector calculus is Dv-F(w) to denote the derivative of F in the direction 
w at the point v, thus DyF denotes the linear operator approximating F(v + •) ~ -^('^)- 

Lemma 2.1 For any v e A, Dvi?(7r(v) — v) > 0. Furthermore, the following are 
equivalent: 

(i) Dvi/(7r(v) - v) = 

{ii) Dvi/|y„cg(^) = 

{Hi) for those i such that Vi > , are equal (3) 

(iv) for all i, Vi = J2j RijViVj/Nj 

(v) 7r(v) = V 

where 0/0 = in (iv) by convention. 

Proof: For fixed i and j and constant c, consider the operation of increasing Vj by the 

quantity cViVj{Nj — Ni) and decreasing Vi by the same amount. When c = l/i7(v) and 
this operation is done simultaneously for every (unordered) pair i, j, then the resulting 
vector is 7r(v): the next value of the i^^ coordinate is given by 

Vi + (l///(v))(E,- ViVjNi - Ej ViVjNj) 
^ Vi + {l/H{v)){viNi - ViH{v)) = 7r,(v). 

So an infinitesimal move towards 7r(v) corresponds to doing these additions and subtrac- 
tions simultaneously with an infinitesimal c. To show that this increases H, it suffices to 
show that for each unordered pair i, j, the value of H is increased, since H is smooth and 
therefore well approximated by its linearization near any point. So let i, j be arbitrary. 
Writing v'-^-* for the new vector gives 

= RrsVrVs + 2 ^ KigCViVj {Ni - Nj)Vs 

r,s s 

+2Y,^rjCViVj(Nj - Ni) 
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= H{Y) + 2cviVj{Ni- Njf 
> H{v) 

so H is nondecreasing. This proves the first part. 

For the equivalences, first note that if there are any i and j for which Ni ^ Nj and 
neither Vi nor Vj is zero, then H strictly increases. Thus (i) <^ {in). Since 

Dyi? is just inner product with the vector (2iVi, ■ • • , 2Nn).i (4) 

and restricting to face(v) just throws out the coordinates i such that Vi — 0, it is easy 
to see that {ii) <^ {Hi). Assuming {Hi), suppose the common value of the is c. Then 
multiplying {iv) by c gives ^jViVj — c • Vi, so {Hi) ^ {iv). Now assume {iv). Letting 
Mv denote the matrix as well as the Markov chain, {iv) just says that v is stationary for 
Mv . Then 7r(v) - v = so (t") holds. And finally, {v) =^ {i) trivially. □ 

Proposition 2.2 The set C has finitely many connected components, each of which is 

closed and on each of which H is constant. Furthermore, if all the principal minors of 
R are invertible, then C consists of at most 2°* — 1 points. 

Proof: By (3) (ii), C is the union over all 2'^ — 1 faces F of the sets = {v : 'D^H\p{w) = 
0}. By (4) and the comment following, DvH\f is linear, so Cp is a closed, convex, 
connected set. It is easy to see that H is constant on Cp by integrating D^H\f. The 
first part of the proposition follows since each connected component of C is the union of 
some of the Cp. For the second part, fix a face F and let Ri? be the matrix gotten from 
R by deleting rows and columns indexed by those i for which Vi = for all v G -F. If 
this is invertible, then equation (3) (iii) implies that the only possible element of C in the 
interior of F is whichever multiple of (1, . . . , 1)R^^ lies on the unit simplex. □ 

If all the off-diagonal entries of R are positive, it is immediate that M(v) is irreducible 
for all V e A. Conversely, if Rjj = for some i ^ j, then M(v) is reducible when v is 
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any nontrivial combination of Cj and ej. Thus it a necessary and sufficient condition for 
Co to be empty is that Rjj > off of the diagonal. In any event, Cq is a union of proper 
faces of A. The following corollary to Theorem 1.1 is now immediate. 

Corollary 2.3 // all the off-diagonal entries of R are positive and all the principal mi- 
nors o/R are invertihle, then V(n) converges almost surely. 

□ 

3 Proofs of convergence results 

The proof of Theorem 1.1 begins with a lemma giving a lower bound on the expected 
growth of H{y{n)) when V(n) is not near C UCq. 

Lemma 3.1 Let H he a closed subset of the simplex, with jV" n (C U Co) = 0- Then 
there exist an N , L and c > such that for any n > N , E(if(V(n + L)) | V(n)) > 
H(y{n)) -\- c/n whenever V(n) e H . 

Proof: For any n, let Mn{n),Mn{n + 1), . . . denote a Markov chain beg inning at at 
time n, whose transition matrix thereafter does not change with time and is given by 
M(V(n)). Let S'(n) = S(n) and for i > n, let S'(i) = S'(i — 1) + ^Mn{i)j where Cj is the 
j^^ standard basis vector. Let V'(i) = S'{i)/i. 

First I claim that the lemma is true with the Markov process V substituted for V. 
By Lemma 2.1, Dvi^(7r(v) — v)) is nonzero on J\f, so by compactness it is bounded 
below by some co on J\f. Choose any ci < cq. The occupation measure of a process 
between times N and N + L can change by at most L/(N + L) in total variation. 
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Since H is smooth, it is possible to choose N/L large enough so that whenever n > N, 
H[Y'{n) + {L/{N + L))(7r(V(n)) - V(n))] > ciL/{n + L). By the Markov property, 
(S'(n + L) — S{n))/L approaches a point-mass at 7r(v) in distribution as L increases. In 
fact, the rate of convergence of M^(V(n))w to 7r(V(n)) is exponential and controlled by 
the second-largest eigenvalue of Af(V(n)) according to the Perron- Frobenius theorem. 
If M(V(n)) is aperiodic, then since M(v) varies continuously with v, eigenvalues are 
continuous, and the non-degeneracy hypothesis says that J\f contains no points where 
the second- largest eigenvalue is 1, the second- largest eigenvalue is bounded away from 
1. It follows that a large enough L may be chosen uniformly in v so that 'E{H(V'{n + 
L)) — i7(V'(n)) I J^n) > c/n for any C2 < ci, and the claim is established. If M(V(n)) 
is periodic, then it has period 2 and a simple eigenvalue at —1; the claim follows in this 
case from grouping together pairs of times 2n and 2n + 1. 

Now couple the Markov chain V'(n -|- i) to V(n + i) in such a way so the two move 
identically for as long as possible. Formally, define {M„(i)} and {Yi} on a common 
measure space so that if Yj — Mn{j) for all n < j < n + k then 

P(F„+fe M^{n + k)) I y;+fc_i = i)=Y.\ Wijiy{n + k)) - M,,(V(n))|. 

Picking c < C2 and N j L large enough so that 

{L'/N){L/N)\\DH\\^ < (C2 - c)/N, (5) 

the coordinates of V cannot change by more than L/N in L steps, so the probability 
of an uncoupling at any of the L steps is bounded by L'^/N. Then E|if (V(n -|- L)) — 
H{'V'{n + L))\ < (c2 — c)/N by (5), and combining this with the earlier claim proves the 
lemma. □ 

Before proving Theorem 1.1, here is a sketch of the argument. On any set A/" away 
from CUCq, Lemma 3.1 says the expected value of iJ(v(n)) grows, provided you sample at 
time intervals of size L. The cumulative differences between H{v{n + L)) and E(if(v(n-|- 
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L)) I v(n)) form a convergent martingale, so /J(V(n)) itself is growing at rate c/n when 
V(n) e A/". The rate of change in position of V(n) is also order 1/n per step, so if V 
goes from one given point of jV to another, i?(V(n)) increases by an amount independent 
of time. The only way it can decrease again is for V(n) to leave A/" at a place where 
H is large and re-enter where H is small. The effect of such a possibility can be made 
arbitrarily small because H is nearly constant on the connected components of A \ J\f. 

Proof: of Theorem 1.1: Since the connected components, Cj, . . . of C U Co are closed, 
m = mm{d{Ci,Cj)} > 0. Pick any r < m/3. Let 

= {v:d{v,Ci)<r} 

k 

M = A\UA/"/- (6) 
1=1 

Note that 

i^j^diMi\Xi')>r. (7) 

By the preceding lemma with A/" = A/i, Ci, Li, A^i can be found for which n > Ni implies 
E{H{V{n + L)) I V(n)) > H{Y{n)) + c/n. Pick any L' > Li and define 

= Ml' n {v : \H{^) - H{Ci)\ < rc/2L'} 

k 
i=l 

Figure 1 gives an example of these definitions when c? = 3; the heavy lines are the 
boundary of A^i and the lighter lines are the boundary of A2- 

Apply the lemma to A/2 to get A^2, C2 and L2. Define the process {U(n)} that samples 
V(n) at intervals of Li on A/i and L2 elsewhere, by 

U(n,u;) = V(/(n,a;)) 



where 
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f{l,u;) = max{Ari, A^a} and 

^ , , / fin, CO) + U if V(/(n, a;)) G M; 

/(n + l.a;) = < 

\ /(n,a;) + L2ifV(/(n,a;))^M. 

Clearly, U(n) converges if and only if V(n) converges. Letting U{n) = H{\J{n)), write 
f/(n) = M{n) + yl(n) where {M(n)} is a martingale and {A(n)} is a predictable process 
with respect to ^/(n)- The key properties needed are 

M(n) converges almost surely (8) 

A{n + 1) > A{n) + c/n if U(n) e M (9) 
yl(n + 1) > A{n) if U(n) G TVs. (10) 

To verify (8), note that \U{n + 1) - C/(n)| < max{Li, L2}//(n) = 0(l/n), since by (4), 
H is Lipschitz on A. Then \M{n + 1) — M(n)| = 0{l/n) as well, so M{n) converges in 
L^, hence almost surely. Properties (9) and (10) are evident from the construction. 

The next thing to show is Claim 1: U(n) G infinitely often for at most one a 

almost surely. Consider any sample path U(1),U(2), For n < t, define the event 

B{a, b, n, t, lu) to occur if 

U(n) G 7V2" and lJ{t) G TVa^ with U(«) G M for all i such that n< i < t. (11) 

If B{a, b, n, t, cu) occurs, let 

r = max{i : n <i < t and U(i) G A/'i''} and 

s = min{i :n<i<t and U(i) G M''} 

be respectively the last exit time of jVi" and the first entrance time of jVi''. The dotted 
path in figure 1 gives an example of this. By (9) and (10), 

A{i + 1) - A{i) > c/i ior r < i < s 
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A{i + 1) - ^(i) > forn < i < t. 



Then 

A{t) - A{n) 
= [A{t) - A{s)] + [A{s) - A{r + 1)] 

+[A{r + 1) - A{n + 1)] + [A{n + 1) - A{n)] 

> + ( E c/M + 0-L2/n 

yi=r+l / 

= 0(l/n) + (c/Li)X:^iA 

i=r 

> 0(l/n) + (c/Li)X:|U(i + l)-U(z)| 

> 0(l/n) + (c/Li)|U(s)-U(r)| 

> 0(l/n)+rc/Li 

by (7). Now U{t) - U{n) < H{Cb) - H{Ca) + rc/L' by the construction of jVs. So 
M(t) - M{n) < H{Cb) - H{Ca) + rc/L' - rc/L^ + 0{l/n). If H{Cb) < H{Ca), the choice 
of r guarantees that this expression is strictly negative and bounded away from for 
large n. Therefore if M{n){u) converges, then B{a,b,n,t,uj) happens only finitely often 
for a, b such that H{Cb) < H{Ca). But then it happens only finitely often for any a ^ 
since U can make only k — \ successive transitions from A/2° to M2' with H{Cb) > H{Ca). 
Thus the almost sure convergence of M{n) implies that U(n) G A/2" infinitely often for 
at most one a almost surely and Claim 1 is shown. 

In other words, transitions between small neighborhoods of Ci and Cj eventually cease 
for i ^ j. Claim 2 is that V(n) may not oscillate between a small neighborhood of Cj 
and a set bounded away from C. To show this, require now that r < m/6. With Hi 
and A2 defined as before, define A3 C J\fi by (6) with 2r in place of r. Since 2r < m/3, 
equation (7) holds with As in place of A/i.. An argument identical to the one estabhshing 
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Claim 1 now shows that with probabihty 1 there are only finitely many values of n and 
t for which 

U(n) e M", U(i) e 7V3 and V{t) e iorn<i<t. 

[The argument again: 74(i) is nondecreasing when U(n) e jVi" and increases by at least 
the fixed amount rc/Li each time U makes the transit from jVi" to A3. The increase in 
A is greater than the greatest difference in values of H taken at two points of so 
the martingale M must change by at least rc/Li — rc/L' during every transit. Since M 
converges, this happens finitely often.] 

Claim 3 is that the event {u; : U(i, lo) G jVi for all t > n} has probability for each 
n; it is proved in an identical manner. Putting together Claims 1 and 3, it follows that 
for any small r there is precisely one a for which U(n) e jVi" infinitely often. Then by 
Claim 2 for a different r, A3 stops being visited, so letting r — > proves the theorem. □ 

The proof of Theorem 1.4 is just an easier version of the proof of Theorem 1.1. 

Sketch of proof of Theorem 1.4: A process U(n) may be defined as in the previous proof, 
so that V(n) converges iff U(n) converges and so that U{n)'^^H{U{N)) breaks into a 
martingale M(n) and a predictable process A{n). Note that the argument showing an 
bound of c/n on M{oo)—M{n) still works conditionally on U(n). By a standard maximal 
inequality, given any e > 0, an n may be chosen large enough so that P(inf{Af (n) — M(n+ 
i) : i > 0} < —e \ U(n)) < e. The assumptions of the theorem imply the existence of an 
e for which the component B of H~^[a — 2e, a] is disjoint from (C U Co) \ A, where a is 
the value of H on A. Now for sufficiently large n, the event U (n) G H~^[a — e,a]r\B has 
positive probability. Conditional on this event, the probability that M{n+i)—M{n) never 
goes below — e has been shown to be less than e for large n. Since dist{U {n),C U Co) 
by Theorem 1.1, and U(n) cannot leave B without U{n) becoming less than a — 2e, it 
follows that dist{\J{n), A) 0, proving the theorem. □ 
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4 Proof of Theorem 1.3 



To prove Theorem 1.3, begin by seeing why it should be true. With p as in the statement 
of the theorem, equation (3) (iii) says that the Ni have a common value. A, for those i 
such that Pi > 0. Assuming (2) for a given and using equation (4) for TiH shows that 
Nk > Nj = A. So 

J2 ^kiPi/Ni = Yl ^kiPi/X = Nk/X =1 + 6 (12) 

i pi>0 

for some b > 0, k such that Pk = 0. Now when V(n) is close to p, Vk{n) will be close to 
but not equal to zero. The expected number of visits to state k during a period of time 
from n to n + T in which the occupation measure is close to p will be approximately 
TJ2iPi{^ikVk/Ni) — TvkNk/X = (1 + b)Tvk- In other words, Vk will begin to increase 
and p should be an unstable point with no possibility of V(n) converging there. The 
actual proof will consist of making this rigorous. 

To avoid bogging down in trivialities, S(n) and V(n) will be used to stand for S([nJ) 
and V([nJ). Inequalities will be verified as if n were an integer; it is always possible 
to choose epsilons and deltas a little bit smaller to compensate for the roundoff errors. 
Begin by recording a few propositions whose proofs are omitted when elementary. 

Proposition 4.1 Fix p and let Mi he a neighborhood of p. For any 5 > there is a 
neighborhood J\f of p included in Hi such that for all n > 1/ 5, the two conditions 

(i) V(n) e A/" and 
(a) \{n + Sn) e H 

imply 

(Hi) {S{n + 5n)-S{n))/5neAfi ■ 

□ 
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The heuristic calculation at the beginning of this section is made precise as follows. 

Proposition 4.2 Let p,k,b be such that (12) holds and let S he any vector function of 
n. Then there is an e > and a neighborhood J\fi = {v G A : |v — p| < e} such that for 
all 6 > and for all n, the conditions V(n) e Mi and {Si{n + 5n) — Si{n)) / Sn > Pi — e 
for all i imply 

^{S,{n + Sn) - S,{n))R,,v,{n)/{l + S)N,{n) > <5l±^5,(n). (13) 

Proof: As e 0, 1/n times the left-hand side converges to (5pfc(n)/( 1+5) X)j Pi 

= Spk{n){l + b)/{l + S) while 1/n times the right-hand side converges to 6pk{n){l + 

b/2)/{l + 5). Since the convergence is uniform in 5, the result follows. □ 

Proposition 4.3 Let 6 > and ei > &e given. Let {Bq} be a collection of independent 
Bernoulli random variables with E(X]„ i?„) > (1 -|- b)L. There exists an Lq such that 
whenever L > Lq, P(Ea Ba/L >1 + b/2) > 1 - ei. □ 

Proof of Theorem 1.3: By hypothesis, condition (2) holds, and hence (12) holds for some 
choice of p, k and h which are fixed hereafter. Pick e and Mi according to Proposition 4.2. 
Apply Proposition 4.1 to Mi and p with 5 = 1 A (1 b/2) /{I + 6/4) - 1 to obtain a 
neighborhood jV of p with the appropriate properties. Temporarily fixing n, define the 
event S„ by V(i) e jV' for all n < i < (1 -|- 5)n. Define stopping times {n^r} and a family 
of Bernoulli random variables {Bi^r} as follows. 

Let Ti^r < cxD be the r^^ time after n that Yj = i, so formally o = and 
Ti,r+i — inf{j > Ti^r '■ = i} ■ Let Bi^r be independent and Bernoulli with 

P{Bi,r = 1) = RkMn)/{l + S)Ni{n) (14) 
and coupled to the variables {1^} so that if Bi^r = 1 and Ti^r < (1 + 5)^ then 
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To verify that this construction is possible, check that the probabihty of a transition from 
vertex i to vertex k never drops below the quantity in (14): 



for Ti^r < (1 + ^)^- 

Now consider the subcoUection >1 =^{(i,r) : r < Sn{pi — e)}. By Proposition 4.1, 
'Ti,r < (1 + whenever the event B„ holds. Meanwhile, 



By Proposition 4.2, this quantity is at least 6{1 + h/2)Sk{n) / {1 + 5) which is at least 
5(1 + b/4)Sk{n) by choice of 6. Apply Proposition 4.3 to the collection {B^ '■ ol G A}, 
with h replaced by 6/4 and ei to be chosen later to obtain a value for Lq. Now calculate 
the conditional expectation E(ln(t'/;((1 + | Tn^ Sk{n) > Lq). By Proposition 4.3 and 
the coupling. 



P{Bn and Sk{{l + S)n) - Sk{n) > 5(1 + b/8)Sk{n) \ Sk{n) > U) 

> P{Bn\J'n,Sk{n)> Lo)-ei. 

When Sk{{l + S)n) - Sk{n) > S{1 + b/8)Sk{n), it follows that Vk{{l + S)n) > Vk{n){l + 
6/8(1 + 5)) > Vk(n)(l + 6/16). Therefore 



> {1/(1 + 6))RkMn)/Ni{n) 



E(E ^a) = E^^fe - e)Rk^Vk{n)/{l + S)N,{n). 



E{ln{vk{{l + d)n))\Tn,Sk 



(n) > Lo) 



> (P(S„ I J'n, Skin) > Lo) - ei) ln((l + b/16)vk{n)) 
+(1 - P{Bn I ^n, Skin) > Lo) + ei) ln(^;fe(n)/(l + 6)) 



(15) 



> ln(^;fc(n)) + ln(l + 6/32) - i^P(i3;; | ^„ Skin) > Lq) 



(16) 
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for i^ln(l + 6){1 + b/16), when ei is sufficiently small. To conclude from this that 
P(V(n) — > p and Sk{n) > Lq for some n) = 0, write T(n) = (1 + (5)"Lo, = ^T{n), 
Xn = ln{vk{T{n))), (5 = c/2K, T = inf{n : V(i) ^ A/" for some Lq < i < T{n)] and 
calculate 



n-1 



EX„AT — ^0 + XI E(lT>j(-^i+l — Xi) I ^i) 



i=0 



n-1 



i=0 



> -'^o + X] ElT>i (c - K(5) ■ lp(r=i+i I e!„)</3 - K ■ lp(r=j+i | g„)>p 



by equation 16 



n-1 



> Xo+^E1t>. {c-K/3)-{l-/3-'P{T = t + l\g^,T>t)) 



i=Q 



-K-p-'P{T^i + l\gr„T>i) 

n-1 

> Xo + ^(c - Kf3)lT>i - p-\c + K- Kf3)P{T = i + 1) 

1=0 

> Xo + n{c-Kl3)P{T>n)-l3-\c + K-K/3). 

Since c—KP was chosen to be positive, P(T > n) must go to zero, showing that V(n) — > p 
and Sk{n) > Lq eventually is impossible. 

Finally, to show that P(V(n) — > p and Sk{n) < Lq for all n) = 0, note that since 
-^fc(p) > 0, there is a sufficiently small neighborhood A/" of p for which P(Yi^i = 
/c|jFj,V(n) G M) is always at least a constant times n~^. Borel-Cantelli implies that 
k is visited infinitely often whenever V(n) remains in J\f, and this finishes the proof of 
Theorem 1.3. □ 



16 



5 Proof of Theorem 1.2 



Begin with a proof of the equivalences: 

p fails to be a maximum for H 

R has more than one positive eigenvalue 
Dp(7r — I) has a positive eigenvalue 

The matrix R can be viewed as a symmetric bilinear form whose quadratic form gives 
H when restricted to A. Let W = A — p he the translation of A containing the origin. 
For w eW, 



so the quadratic form R(v, v) decomposes into the sum of R\w and a positive form on 
the one-dimensional subspace spanned by p. Then R has precisely one more positive 
eigenvalue than the quadratic form R\w But equation (17) with w = v — p shows that 
H{y) — R|vk(v — p) + X so H has a strict maximum at p if and only if R|vk has a 
strict maximum at the origin. Since R has no zero eigenvalues, R\w will have a strict 
maximum when it has a maximum, which happens when it has no positive eigenvalues. 

For the second equivalence, note that tt is smooth on the interior of A, so Dp(7r — /) 
exists. Let T be the operator on R'^ whose matrix in the standard basis is given by 



I claim that T — Dp(7r — /) on W. Indeed, using Definition (3) to define tt on all of R*^ 
and differentiating shows that the matrix representation for Dp(7r — /) is given by 



R(w, p) = w^Rp = w • A • (1, . . . , 1) = 



where A is the common value of the Ni. Then 



R(w + cp, w + cp) = R(w, w) + R(cp, cp) = R|v[,'(w) + c^A 



(17) 



[Dp(7r-/)] 



d 




dej 



v=p 
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d vM 




) 




p 



dH 



(using the fact that all the Ni have a common value A = -f^(p) and the identity — — = 2A^j). 

Then the matrices for T and Dp(7r — /) differ by a matrix with constant rows, hence define 
the same operator on W. Now let diag{p) be the diagonal matrix with i,i entry equal 
to Pi and observe that T = diag{p)'R./X. Since R is symmetric and diag{p) is positive 
definite, T must be diagonalizable with real eigenvalues and has the same signature as 
R (see [4, Theorem 6.23 and 6.24 page 232]). Since T has p as a positive eigenvalue and 
W as an invariant subspace, it has one more positive eigenvalue than Dp(7r — /) and the 
conclusion follows. □ 

To finish proving Theorem 1.2, it remains to show that v(n) cannot converge to an 
interior point where Dp(7r — I) has a positive eigenvalue. The method of proof is from 
[7] , the first step being a construction of a scalar function which measures "distance from 
p in an unstable direction" ([7, Proposition 3]). 

Lemma 5.1 Under the assumptions of Theorem 1.2, suppose that Dp(7r — /) has a pos- 
itive eigenvalue. Then there is a function rj from a neighborhood of p to [0, oo) such 
that Dv7/(7r(v) — v) > kiri{\') in a neighborhood of p for a constant ki > 0. Further- 
more, rj is the square root of a smooth function (whose gradient necessarily vanishes 

whenever the function vanishes) but whose second partials are not all zero (thus f] is 
not diff'erentiable where it vanishes). It follows from this that rj is Lipschitz and that 
rjiy + w) > ^^(v) + Dv?7(w) + A;2|wp in a neighborhood of p, where Y)^ri{F) may be any 
of the support hyperplanes to the graph of rj at points where rj vanishes. □ 

Use this lemma and a sequence of appropriately chosen stopping times to convert 
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questions about convergence of V(n) into questions about the convergence of a scalar 
stochastic process. To do this, fix a neighborhood TV of p in which all coordinates 
are bounded away from zero. Let L{v) be the mean recurrence time to state 1 for 
the Markov chain M(v) and let L^ax = sup^g^y L(v). Pick Nq > 2Ljnax and define 
(Jo = inf{/i; > A^o : Ife = 1} and (T„+i = inf{A; > (T„ : = 1} to be the successive hitting 
times for state 1. Let r = inf{/c > Nq -.Ylk) ^ J\f} and let Ti — r/KUi. For the remainder 
of the section, let E and P denote conditional expectation and conditional probability 
with respect to . The following facts are elementary. 

Proposition 5.2 

(i) The distribution of t„+i — t„ has finite conditional expectation and 
variance. Specifically, 

for some a > 0. 

{ii) For any e > 0, Nq, there is a constant Ci such that 

P{n < Tn < ciu for all t„ < r) > 1 — e. 

{iii) For any e > 0,7 < 1, A^o CL'^^d r may he chosen large enough so that 
P(Tn+i — T„ > n^~'^ for some n > r) < e. 

□ 

Let \]{n) = V(r„), let Sn = ri{\J{n)) and let X„ = Sn — Sn-i- The following estimate 
shows that the expected increment in U from time n to n + 1 is close to the value given 
by the Markov approximation. 
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Proposition 5.3 For any n > 0, 

7r(U(n)) - U(n) 



E(U(n + 1) - U(n)) - L(U(n))- 



0{Tn-'). (18) 



Proof: Couple the process {Yi : i > t„} to a Markov chain Y- with Yr^ — 1 and transition 
matrix M(U(n)) in such a way that the two processes remain identical for as long as 
possible. Define V, S', r' and U' analogously to the unprimed variables. Establish first 
that 

E|U(n + l) -U'(n + l)| = 0(t„-'). (19) 

To see this, observe that since transition probabilities for Y and Y' differ by at most /i;/r„ 
at time r„ + k, the conditional probability of the two processes uncoupling before time 
Tn+i is at most 

^ P(r„+i - r„ > k)k/Tr, < e-7(l - e"") V„ (20) 

fe>0 

according to Proposition 5.2 (i). On the other hand, E(r/j^^ — (xn + k) \J^k+T,^) and 
E(r^_,_]^ — (r„ + k) \ J-'k+T„) are bounded by L„^a.x and (1 — e^")^^ respectively on the event 
of the uncoupling occurring at time /c + r„, which implies that 

E(|v(t„+i) - v(f„+i)| I uncoupling before r^+i) 

< supfcE(|v(r„+i) - v(r„)| + |v(f„+i) - v(r„)| 
I uncoupling occurs at t„ + k) 

< SUPfe(l/T„)(E(T„+i -Tn-k + 1 

+fn+i — Tn — k + 1) I uncoupling occurs at + k) 

< (l/T„)(L^ax + l/(l-e-'^) + 2). 
Combining (20) and (21) gives (19). 



(21) 
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The quantity \J{n + 1) — U(n) in the LHS of equation (18) may now be replaced by 
the quantity U'(n + 1) — U(n), since the two are within 0(r~^) in expectation. Since Y' 
is a Markov chain, the following identity holds: 

E(S(t„J - S(t„)) = L(U(n))7r(U(n)). (22) 

Component by component, we then have 

E(C/;(n + 1) - Ulin)) 

= E (-{SliTn+i) - S,{t^) - - r,)5,(T,)/r„)) - Q 
= -L(U(n))[7ri(U(n))-C/,(n)]-g 
according to (22), where 



ri+1 



The denominator of Q is ar least r„ and the numerator is bounded by the product of 
two geometric random variables according to Proposition 5.2, so \Q\ — 0{t~^) and the 
proposition is proved. □ 

Use this estimate to prove the following proposition, which together with the Lemma 5.5 
proves Theorem 1.2. 



Proposition 5.4 Let Sn and be defined from V as above. Let Af remain fixed as in 
the paragraph before Proposition 5.2. For any e > there are constants 6i, 62, c > and 
7 > 1/2 and an N such that whenever Nq> N then 

P(i3 I ^atJ > 1 - e, (23) 
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where B is the event that either equations (24) - (27) are satisfied for all n> Nq or else 
V(n) at some point leaves N . 

E(X„+i2 + 2Xn+^Sn) > 6i/n' (24) 

E(X„+i5„l5„>c/n) > (25) 

P(|X„+i| <l/(n + l)^) = l (26) 
E(X„+i2) < (27) 

Lemma 5.5 // (23) holds for a nonnegative stochastic process Sn — So + J2^=i ^i, then 
P(5„ ^ 0) = 0. 

Lemma 5.5 is a variant on an argument from [7], whose proof can be outlined as 
follows. 

First assume that (23) holds with e = 0, i.e. that (24) - (27) hold almost surely, and 
show in the following three steps (A)-(C) that P(5'n — >■ 0) = 0. Let k be any positive 
real number less than and without loss of generality restrict n to be at least 4c^/ k 

so that kj^^Jn > c/n and c is the constant in condition (25). 

(A) Claim: given any Sm the probability of finding Sm > k/^/n for some M > n is at 
least 1/2. 

Proof: Assume without loss of generality that < kj \fn. Let a be the 
first i > n for which Si > k/ y/n. Then for any M > n, 

E('S'^am) 

M-l 
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M-1 

i=n 

M-1 

> P((T > M) ^ bi/i'^ 

i=n 

by (25) 

> P((7 > M)bi/n. 

But condition (26) implies that never gets much more than k/^/n and 
since fc^ < 2bi this forces P{a > M) < 1/2 and the claim is proved. 

(B) Claim: given that Sn > k/^/n the probability that Sm will never return to the 
interval x < k/^/n for M > n is at least a = 462/ (4^2 + k"^) . 

Proof: Assume Sn > k/\/n. Let a be the first i > nioi which Si < kj^^Jn. 
By condition (25) and the fact that i)Ai > k/2^Jn > c/n, the sequence 
Sa/\i is a submartingale. Decompose this into a mean-zero martingale and an 
increasing process. Summing equation (27) shows the variance of the martin- 
gale to be bounded in L"^ hy b2/n. Then by using the one-sided Tschebysheff 
estimate P(/ - E/ < -s) < Var(/)/(Var(/) + s'^), the probability that the 
martingale ever reaches the interval [—00, —k/2^Jn) is at most 462/ (4^2 + ^^)- 
The martingale is a lower bound for the submartingale so the claim is proved. 

(C) If Sn converges to with non-zero probability, then there is an n which can be chosen 
arbitrarily large and an event A & Tn for which P(5'„ — > | ^) is arbitrarily close to 1. 
When it is greater than 1 — a/2, this contradicts (A) and (B). 
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Now assume (23) instead of (24) - (27). For any Nq, let cr be the first n > No for 
wliich V(n) exits J\f or one of tlie conditions (24) - (27) is violated; cr is a stopping time 
since the conditions arc jF^i-measurable. Let {X*n, S*n '■ n > No} be any process that 
always satisfies (24) - (27) and is coupled to the process {X^, Sn : n > Nq) so that 
the two processes are equal for n < a. Since S"* cannot converge to p, S'„ — p implies 
(7 < oo. For e > let A^o be chosen as in (23). Then with probability at least 1 — e either 
Sn does not converge to p or V(n) exits N. Thus the probability of V(n) converging to 
p without ever exiting H after time A^o is at most e. Since e is arbitrary, it follows that 
P(V(n) ^ p) = 0. □ 

The last step in the proof of Theorem 1.2 is to establish Proposition 5.4. For any 
e > and 7 < 1, condition (26) may be satisfied by choosing A^o at least as large as the 
No in Proposition 5.2 {Hi) (using the fact that t] is Lipschitz). Also, (27) follows directly 
from Proposition 5.2 (i) for any e. To prove (25), let e > and use the bounds on r„ 
from Proposition 5.2 {ii) to get 

E(^„+i) = E(5'„+i) — Sn 

- E(r/(U(n) + [U(Ar + l)-U(n)]))-,5„ 

> E(^7(U(n)) + Du(„)r/[U(n + 1) - U(n)] + 0|U(n + 1) - U(n)n - Sn 
by Lemma 5.1 

= Du(n)77E[U(n + 1) - U(n)] + E(0|U(n + 1) - \J{n)\^)) 



Du{n)'7E ^ 

by Proposition 5.3 



^^^^^))(.-/)U(n) + 0(O 



+ E(0|U(n)-U(n)|2)) 



= ^^ffl!^Dx;H77((7r-7)(U(rz)))+0(0 

'n 

since U(n + 1) — U(n) is of order and 77 is Lipschitz 

> WM,(u(„)) + 0(r-) 

> (28) 
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for some Ci, C2 > by Proposition 5.2 (ii) with probability 1 — e. 

Thus there is a constant c — cijc^ such that for Sn > c/n the first term of (28) dominates. 
Hence (25) is true with probabihty at least 1 — e. 

Finally, to show (24), note that it suffices to show that E(X^^]^) > cs/n^ for some C3, 
assuming t„ < r. For, in the case that Sn > C2/cin, (25) holds, implying (24), while if 
Sn < C2/C1, (28) is at least — 2c2 and the second term on the left hand side of (24) is 
at least — 4c|/cin~^ and for large enough n this is dwarfed by the E(X^_,_^) term. 

Now a moment's thought shows that E(X^_|_j^) must be at least order from the 
nonvanishing second partials of rj^, it follows that there is a unit vector w E W such 
that \ri{v + rw) — ?7(v) > Cr for some positive C uniformly in v in a neighborhood of 
p. There exists a positive multiple of w and a fixed sequence of sites {2, . . . ,d}, such 
that if these are the sites visited between times t„ and t„+i, then U(n + 1) — U(n) will 
be arbitrarily close to this multiple of w. This sequence of visits happens with positive 
probability, so (24) holds, estabhshing Proposition 5.4 and Theorem 1.2. □ 

6 Examples and further questions 

Example 1: Suppose Kij = 1 — The critical set C contains just the centroids of the 
faces, and the degeneracy set Cq is empty, so by Corollary 2.3, V(n) converges almost 
surely to some point of C. It is easy to see that the centroids of all proper faces are linear 
nonmaxima. For example, if p = (1/3, 1/3, 1/3, 0, 0, . . .) then A^j = 2/3 for i < 3 and 1 
for i > 3. Thus Theorem 1.3 implies V(n) ~^ {^t ■ ■ ■> ^) almost surely. 
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Example 2: Here is an example where lim^^oo is not deterministic. Suppose 



3 1 1 
12 4 
1 4 2 



All the minors of R are invertible and off-diagonal elements nonzero, so Corollary 2.3 ap- 
plies and V(n) converges almost surely to a point of C. The interior point (f , f , f )R~"^ = 
(1/2, f/4, f/4) is unstable because R has two positive eigenvalues, so the probability 
of convergence there is zero. The critical points in the middle of two of the edges, 
(1/3,2/3,0) and (1/3,0,2/3) are linear nonmaxima as are the vertices (0,1,0) and 
(1, 0, 1), so the probability of convergence to each of these points is zero by Theorem 1.3. 
On the other hand, (1,0,0) is a local maximum for H as is (0,1/2,1/2), so by Theo- 
rem 1.4, it follows that P(V(n) ^ (1, 0, 0)) = 1 - P(V(n) ^ (0, 1/2, 1/2)) = a for some 
< a < 1. 

Example 3: Let G be a finite abelian group and let T be a set of generators for G 
closed under inverse. Let R be the incidence matrix for the Cayley graph of {G.T). 
By symmetry, the point p = (l/jGI, . . . , l/IG]) is in C. The eigenvalues of R are just 

def 

^ix) = ^geT x{9)^ X ranges over the characters of G. If these arc all nonzero, then 
p is the unique critical point in the interior of A. In this case, P(V(n) p) is zero 
or not according to whether \{x) > for any nontrivial character x- In fact it is easy 
to verify that P(V(n) — > p) is always zero or one when the principal minors of R are 
invertible, by checking that the negativity of A(x) for all nontrivial x implies that each 
other critical point is a linear nonmaximum. 

There are many natural unanswered questions about the behavior of V(n). One could 
of course ask for rates of convergence, central limt behavior, etc., but I think it is more 

important both from a mathematical and a modeling point of view to try to extend the 
results already obtained so as to cover all matrices R. For example, when R is a matrix 
of all ones, every point of A is critical so Theorem 1.1 says nothing, while comparison to 
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a Polya urn model shows that V(n) converges almost surely to a random point of A with 
an absolutely continuous distribution. In general, when C has components larger than a 
point, one expects the motion of V inside a component to be martingale- like and hence 
still converge to a single point, this time with a nonatomic distribution. Also, while the 
symmetry assumption on R is vital to the proofs (since it allows 7r(v) to be explicitly 
calculated) I do not believe that it is actually necessary for the results. 

Conjecture 1 lim„-^.oo V(n) exists almost surely without any nondegeneracy assump- 
tions on R. 

Conjecture 2 Theorem 1.1 holds whether or not R is symmetric. Also, when R is 
not symmetric, there is a function H such that the first part of Lemma 2. 1 holds and 
Theorem 1.2 holds. 
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